{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Bin Packing"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Pre-requisites \n",
    "\n",
    "### Imports\n",
    "\n",
    "To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "\n",
    "from sagemaker.rl import RLEstimator\n",
    "\n",
    "sys.path.append(\"common\")\n",
    "from misc import get_execution_role"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Setup S3 bucket\n",
    "\n",
    "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Define Variables \n",
    "\n",
    "We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# create a descriptive job name \n",
    "job_name_prefix = 'binpacking-actionmask-apex-B100-LW-p3-8x'"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Configure where training happens\n",
    "You can train your RL training jobs using the SageMaker notebook instance or local notebook instance. In both of these scenarios, you can run the following in either local or SageMaker modes. The local mode uses the SageMaker Python SDK to run your code in a local container before deploying to SageMaker. This can speed up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`. When setting `local_mode = False`, you can choose the instance type from avaialable [ml instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "local_mode = False\n",
    "\n",
    "if local_mode:\n",
    "    instance_type = 'local'\n",
    "else:\n",
    "    # If on SageMaker, pick the instance type\n",
    "    instance_type = \"ml.m5.large\""
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Create an IAM role\n",
    "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Install docker for `local` mode\n",
    "\n",
    "In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.\n",
    "\n",
    "Note, you can only run a single local notebook at one time."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "# only run from SageMaker notebook instance\n",
    "if local_mode:\n",
    "    !/bin/bash ./common/setup.sh"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Define the Sagemaker-RL docker image for Ray\n",
    "\n",
    "We will use the `TensorFlow Ray` for this project. For a list of available RL images see https://github.com/aws/sagemaker-rl-container "
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n",
    "aws_region = boto3.Session().region_name\n",
    "custom_image_name = \"520713654638.dkr.ecr.{}.amazonaws.com/sagemaker-rl-tensorflow:ray0.6.5-{}-py3\".format(aws_region, cpu_or_gpu)\n",
    "print(\"Using ECR image %s\" % custom_image_name)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Train the RL model using the Python SDK Script mode\n",
    "\n",
    "If you are using local mode, the training will run on the notebook instance. When using SageMaker for training, you can select a GPU or CPU instance. The [RLEstimator](https://sagemaker.readthedocs.io/en/stable/sagemaker.rl.html) is used for training RL jobs. \n",
    "\n",
    "1. Specify the source directory where the gym environment and training code is uploaded.\n",
    "2. Specify the entry point as the training code \n",
    "3. Specify the choice of RL toolkit and framework. This automatically resolves to the ECR path for the RL Container. \n",
    "4. Define the training parameters such as the instance count, job name, S3 path for output and job name. \n",
    "5. Specify the hyperparameters for the RL agent algorithm. The RLCOACH_PRESET or the RLRAY_PRESET can be used to specify the RL agent algorithm you want to use. \n",
    "6. Define the metrics definitions that you are interested in capturing in your logs. These can also be visualized in CloudWatch and SageMaker Notebooks."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Define Metric\n",
    "A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "metric_definitions = [{'Name': 'episode_reward_mean',\n",
    "  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_max',\n",
    "  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_len_mean',\n",
    "  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'entropy',\n",
    "  'Regex': 'entropy: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_min',\n",
    "  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'vf_loss',\n",
    "  'Regex': 'vf_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'policy_loss',\n",
    "  'Regex': 'policy_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            \n",
    "]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Define Estimator\n",
    "This Estimator executes an RLEstimator script in a managed Reinforcement Learning (RL) execution environment within a SageMaker Training Job. The managed RL environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "train_entry_point = \"train_ppo.py\"\n",
    "train_job_max_duration_in_seconds = 3600 * 24 * 2\n",
    "\n",
    "estimator = RLEstimator(entry_point=train_entry_point,\n",
    "                        source_dir=\"src\",\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=1,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        metric_definitions=metric_definitions,\n",
    "                        train_max_run=train_job_max_duration_in_seconds,\n",
    "                        hyperparameters={}\n",
    "                       )"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "estimator.fit(wait=local_mode)\n",
    "job_name = estimator.latest_training_job.job_name\n",
    "print(\"Training job: %s\" % job_name)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Visualization\n",
    "\n",
    "RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "s3_url = \"s3://{}/{}\".format(s3_bucket,job_name)\n",
    "\n",
    "intermediate_folder_key = \"{}/output/intermediate/\".format(job_name)\n",
    "intermediate_url = \"s3://{}/{}training/\".format(s3_bucket, intermediate_folder_key)\n",
    "\n",
    "print(\"S3 job path: {}\".format(s3_url))\n",
    "print(\"Intermediate folder path: {}\".format(intermediate_url))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from sagemaker.analytics import TrainingJobAnalytics"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "if not local_mode:\n",
    "    df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()\n",
    "    df_min = TrainingJobAnalytics(job_name, ['episode_reward_min']).dataframe()\n",
    "    df_max = TrainingJobAnalytics(job_name, ['episode_reward_max']).dataframe()\n",
    "    df['rl_reward_mean'] = df['value']\n",
    "    df['rl_reward_min'] = df_min['value']\n",
    "    df['rl_reward_max'] = df_max['value']\n",
    "    num_metrics = len(df)\n",
    "    \n",
    "    if num_metrics == 0:\n",
    "        print(\"No algorithm metrics found in CloudWatch\")\n",
    "    else:\n",
    "        plt = df.plot(x='timestamp', y=['rl_reward_mean'], figsize=(18,6), fontsize=18, legend=True, style='-', color=['b','r','g'])\n",
    "        plt.fill_between(df.timestamp, df.rl_reward_min, df.rl_reward_max, color='b', alpha=0.2)\n",
    "        plt.set_ylabel('Mean reward per episode', fontsize=20)\n",
    "        plt.set_xlabel('Training time (s)', fontsize=20)\n",
    "        plt.legend(loc=4, prop={'size': 20})\n",
    "else:\n",
    "    print(\"Can't plot metrics in local mode.\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "conda_tensorflow_p36",
   "language": "python",
   "name": "conda_tensorflow_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}